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Abstract 

I present an analytic method for estimating the errors in fitting a distribution. 
A well-known theorem from statistics gives the minimum variance bound (MVB) 
for the uncertainty in estimating a set of parameters Aj, when a distribution func- 
tion F{z; Xi...Xm) is fit to N observations of the quantity(ies) z. For example, 
a power-law distribution (of two parameters A and A) is F{z;A,A) = Az~^. I 
present the MVB in a form which is suitable for estimating uncertainties in prob- 
lems of astrophysical interest. For many distributions, such as a power-law distri- 
bution or an exponential distribution in the presence of a constant background, the 
MVB can be evaluated in closed form. I give analytic estimates for the variances 
in several astrophysical problems including the gallium solar-neutrino experiments 
and the measurement of the polarization induced by a weak gravitational lens. I 
show that it is possible to make significant improvements in the accuracy of these 
experiments by making simple adjustments in how they are carried out or ana- 
lyzed. The actual variance may be above the MVB because of the form of the 
distribution function and/or the number of observations. I present simple meth- 
ods for recognizing when this occurs and for obtaining a more accurate estimate 
of the variance than the MVB when it does. 

Subject Headings: gravitational lensing - methods: statistical - solar neutrinos 



1. Introduction 



A very general problem is to fit a set of observations to a distribution function 
of one or several parameters. For example, one might want to fit the observed 
projected separation of binaries, s, to a power-law function of the form, f{s)ds = 
As~^. In this case there arc two parameters, the index £ and the normalization A. 
Or one might fit a point spread function to a two-dimensional gaussian of position 
{x, y) which has 6 parameters (1 for normalization, 2 for first moments, 3 for second 
moments). For any given data set, one can always estimate the errors using monte 
carlo or other techniques. However, it is often desirable to estimate the errors in 
advance of the observations in order to optimize the observing program. Moreover, 
even if the final error estimate is made by monte carlo, one would often like to have 
an external analytic check on the estimate. 

A well-known theorem in statistics (e.g. Kendall, Stuart, & Ord 1991) gives a 
minimum variance bound (MVB) for any linear combination of the parameters. It 
is often the case that the actual variance is equal to the MVB and that the MVB 
can be evaluated in closed form. This makes the MVB potentially a very useful 
tool. Nevertheless, standard texts do not generally make clear how to evaluate 
this bound in practical problems nor do they give guidance on how to determine if 
the actual variance is roughly equal to the MVB nor how to estimate the variance 
when it is well above the MVB. 

Here I present the MVB in a form that allows its direct evaluation. I present 
several apphcations of astrophysical interest for which the MVB can be evaluated 
in closed form. I give a derivation of the MVB which allows one to see exphcitly 
the conditions under which the true variance exceeds the minimum. When this 
happens, it is usually possible to modify the MVB formula to give a reasonable 
estimate of the true variance. 

Suppose that one makes N observations from a distribution F which is a func- 
tion of m parameters Ai...A^ and p known quantities which latter I collectively 
label z. In the first example given above p — 1 and z — s. In the second, p — 2 
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and z = {x,y). Then in the large hmit and subject to certain conditions, the 
covariance between the measurements of Aj and is given by 



cov(Ai, Xj) = Cij, 



c = b 



-1 



(1.1) 



where 




dlnF dlnF 



dXi dXj 



(1.2) 



and where 



(g) = / dPzF{z;Xi...Xm)g{z) / / dPz F{z; Xi...Xm). (1.3) 




The variances of the Aj are then just the diagonal elements of c, var(Ai) = ca. 

In § 3, I prove this formula in the limit of large N. In § 4, I show how to modify 
the formula when it diverges. I also show that if N is not large, then one must 
smooth F over the sampling scale before differentiating or alternatively, exclude 
from the integrals those parts of observation space that are not well sampled. First, 
I present a few applications. 



2.1. Functions With Normalization Plus One Parameter 

Some of the most important applications of equation (1.1) are for distributions 
of one variable which have a normalization plus one other parameter. That is. 



2. Examples 



F{z;Xi,X2)^Af{z;A), 



(2.1) 



with Xi = A and A2 = A. Then 




(2.2) 
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The variance in the parameter A is then 



var(A) = - 



dlnf 
dA 



dlnf 
dA 



(2.3) 



2.2. Power Law 

Suppose that F is a power law, F(z) — Af(z) with f(z) = z~^. And suppose 
that the observations cover a range zi < z < Z2. Then d\n.f/dz — —\n.z. It is 
straight forward to substitute this expression into equation (2.3) and to evaluate 
the resulting integrals: 



var(A) =— 



„A-1 



(Inr)^ 



=-(lnr 



(A -1)2 

-2 



(r^-i - 1)2 

(A=l), 



(A 7^1) 



(2.4) 



where r = Z2/z\ and the evaluation is to be done at the best-fit value of A. 



2.3. Exponential Law In A Cone 



Suppose that one wants to fit the distribution of stars observed in a cone 
(say toward the north galactic pole) to an exponential. (I have previously solved 
this problem by more cumbersome methods, Gould 1989). Taking account of the 
volume element as a function of distance, one finds f{z) = 2;2 exp(— A2;). Then 
dlnf/dA = —z. Suppose that the observations cover a range from to 2;*. Then 



var(A) 



1 

N 



{z') - {zy 



A2 

N 



12 



Q4{Az^) 
Q2{Az^) 



-9 



Q2{Az^) 



n 2n -1 



(2.5) 



where 



Qr{x) 



exp 



i=0 



(2.6) 
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2.4. The Gallium Solar Neutrino Experiments 



Consider first the general problem of finding the normalization ^ to a known 

function f(t) in the presence of a constant but unknown background B, over an 
interval of observations (0, T). That is, F{t) = B + Af{t). Substituting into 
equation (1.1), one finds 



yax{A) 



dtF(t) - 

lodt/Fit) 



(2.7) 



For the special case of f{t) = exp(— t/r), equation (2.7) becomes. 



var(yl) 



At 



B ln(l + A/B) 
^~A W 



W = l-^ln{l + A/B), (2.8) 



where I have assumed exp(— T/r) <S B/A. Note that the quantity At is just 
the total 'signal', so that the expression in square brackets is the suppression of 
statistical significance due to the presence of background. Even if the background 
is known a priori, there is still a suppression factor given by setting W ^ 1. Also 
note that in the limit of high background, var(y4)/y4^ — > tA'^/2B. 

Equation (2.8) can be applied to radio-chemical solar-neutrino detection ex- 
periments, such as the gallium experiments now being conducted by GALLEX 
(Anselmann et al. 1993; Anselmann et al. 1994) and by SAGE (Abazov et al. 
1991). In this case the total number of expected events. At, is a function of the 
length of time. At, that the gallium is left in the detector before being extracted. 
That is, 

A^Q[l-ex.p(At/T)]. (2.9) 

where Q is a constant that depends only on the (unknown) flux of solar neutrinos 
and the (known) characteristics of the apparatus. Equation (2.8) can be used to 
determine the optimal exposure time At and counting time T which maximize the 
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efficiency of the experiment. For illustrative purposes, I will assume that T is fixed 
at T = 180 days. The total length of the experiment, P is fixed. The total number 
of runs is then P/ (At + t^), where is the time required to do the extraction before 
a new run can be started. If is chosen to be small, each run will constrain the 
unknown rate Q relatively weakly, but there will be many runs. On the other hand, 
if At is large, there will be better constraints from each run, but fewer runs. The 
statistical power of the experiment (inverse square fractional error in Q) is given 

by 



2 



var(Q) Ai + U 



1 - 



B ln(l + A/B) 
A W 



(2.10) 



In practice, the signal is measured by two sets of counters, one sensitive to capture 
of K electrons and the other to capture of L electrons. The two counters each have 
their own backgrounds {Bn and Bl) which are different for each run. They also 
have their own efficiencies which lead to slightly different values of Qk and Qj^. The 
total statistical power of the experiment is found by adding the statistical powers 
from each of these channels. To make my estimates for the GALLEX experiment, 
I adopt parameters (Anselmann et al. 1994; P. Anselmann 1994, private communi- 
cation) T = 16.5 days, = 1.5 days, P = 4yr. Bk = 0.02day-\ Bl = 0.06day-\ 
Qk = 0.11 day~\ and = 0.12 day~^. I find that the peak value at At = 14 days 
is TV = 80 which is ~ 13% higher than the value at At = 27 days, the exposure 
time currently adopted by the GALLEX experiment. I find the optimal exposure 
time for the SAGE experiment is also about 2 weeks. 



2.5. Centroid of a Distribution 



Suppose that a distribution is a known, circularly symmetric function around 
an imknown coordinate center (X,Y). That is F(x,y; X,Y) — Af(r), with = 
{x — X)'^ + {y — y)"^- In this case, there are three parameters, Ai — A, X2 — X, 
and A3 = Y. The problem greatly simplifies, however, because all the off-diagonal 
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elements of bij vanish. Thus, var(X) = I/622, or 



var(X) = var(y) = 




(2.11) 



From equation (2.11), one immediately finds that for a gaussian var(X) — 
(T^/N, where a is the standard deviation of the distribution. One might want to 
generalize from this to infer that the error in the centroid always scales as the 
standard deviation of the distribution. However, for distributions more sharply 
peaked than a gaussian the variance can be much smaller than this naive estimate. 

2.6. Statistics of Weak Gravitational Lenses 

An important example of centroiding a sharply peaked distribution arises in the 
statistics of weak gravitational lensing of distant galaxies. The observed eUipticity 
of each galaxy can be represented by a 2-dimensional vector whose magnitude is the 
ratio of the difference to the sum of the major and minor axes and whose direction 
is twice the position angle of the major axis. The effect of a weak gravitational 
lens is to change the ellipticity of an image ej relative to that of the object 60 
by a 2-dimensional vector p, called the polarization. That is, p = — Bq. The 
mean polarization in a given region of the sky can be determined by comparing 
the centroid of the ellipticities of the galaxy images with the centroid of the object 
ellipticities, which latter is assumed to be statistically consistent with zero. 

I take the underlying true distribution of ellipticities, e, to be 

^(e) = ^Hi^^, < e < e„,ax = 0.8, (2.12) 

consistent with the ellipticities observed by Mould et al. (1994) after convolution 
with their measurement errors (J. Villumsen 1994, private communication). I take 
the observed distribution (in the absence of lensing), /(e), to be this underlying 
distribution convolved with a gaussian measurement error, a. I then use equation 
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(2.11) to evaluate the error 5p in determining each component of the polarization. 
The statistical power (inverse variance per galaxy observed) as a function of a is 
shown by the solid line in Figure 1. Also shown (dashed line) is the statistical 
power if the polarization is estimated by taking the unweighted sum of the galaxy 
ellipticities, a procedure which is customary in virtually all observational studies 
(e.g. Fahlman et al. 1994; Mould et al. 1994; Small et al. 1994) and most theoretical 
studies as well. Note that if the errors are fairly large {a ^ 15%), then the naive 
approach of equal weighting is almost as good as the MVB obtained if one uses 
maximum likelihood. The reason for this is that if a is large, the central portion of 
the distribution /(e) is dominated by gaussian errors and so is nearly gaussian. It 
is straight forward to show that when fitting the centroid of a gaussian distribution, 
maximum likelihood is equivalent to equally weighting the data. Note also however, 
that if the errors are relatively small {a ^ 5%), then equal weighting wastes a large 
fraction of the information available in the data. 



Here I prove equation (1.1). Consider a small patch of observation space of 
volume (A^;)^ around a point z. Suppose that the parameters \\...X^ have been 
chosen so that they are at or very near the best fit. The predicted number of 
observations falling within the volume is Uprcd = F{z] \\...X*^){AzY , while the 
number actually observed is a possibly different number, nobs- I work in the limit 
of large n, so that |(npred/?T'obs) — 1| <^ 1. I form by summing over all such 



3. Derivation 



patches 




(3.1) 



I then linearize the equation in the neighborhood of the to obtain a final cor- 



8 



rection AXj, 



^ ^pred,fc ' (3.2) 

To minimize x^, I differentiate and set dy^ / d/Wi = which yields a set of hnear 
equations for the final correction AXj to the initial estimates Ay, 



where 



, ^ {Az)P dF{zk]\l...\m) 

= 2^ ^7 



fdF{zk;Xi->^m)\ [dF{zk-\i...\ 



t ^pred,fc V d\i ) 



(3.3) 



The solution is given by AA^ = Cijdj, where c = b~^. More important from the 
present perspective, the covariance of AA^ with AAj (and hence the covariance of 
Aj with Xj) is given by Cy. (See Press et al. 1986). I evaluate 6y by converting the 
sum to an integral and find 

/•^p 1 OF OF _ ^J d\nFd\nF \ 
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4. Range of Validity 



There are some distribution functions for which equation (1.1) is obviously not 
valid. Consider, for example the distribution 

F{z) = A{K-z), (4.1) 

over the interval (0, A). Using the formahsm of §2, 1 find var(A) = {A^[((A — z)~'^) — 
((A — z)"^) ]}~^. The second term is finite, but the first is logarithmically diver- 
gent. The problem is that in the proof, I assumed that there are a large number 
of observations in every patch of observation space. However, in this example the 
expected number of observations in the interval (A — A^,A) is N{/S.z)'^ /K. For 
< (A/A^)^/^, this number is smaller than unity, so that the assumptions un- 
derlying the proof break down. In essence, the formal derivation treats this region 
of parameter space as contributing an infinite amount of information whereas it 
actually contributes almost none. A practical solution is to cut off the integral at 
z ^ K - {A/N)'^/^, which results in the estimate var(A) = (A2/Ar)/ln(ArAe-^). 
The choice of the cutoff is somewhat arbitrary, but the error in the estimate is 
logarithmic in the cut-off value. 

Even for distributions which give rise to finite integrals one may still ask how 
large the number of observations must be to reach the 'limit of large N\ For small 
N, one usually fits a distribution by maximum likelihood (ML) rather than the 
binned method that I modeled in the previous section. (Indeed, it is common 
to use ML even with large N. However, as I show below, the two methods are 
equivalent for large N.) Thus, to address this question, I begin by examining the 
relation between ML and binned fits. 

In the large N limit, the probability of observing nobs,fc given npi.cd,fc, is gaussian 
distributed. Hence, up to an irrelevant constant, — is the logarithm of the 
probability of making the observations given the model. One could imagine further 
sub-dividing each bin of volume {Az)p into J\f equal sub-bins, with J\f ^ npred,k- 
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The predicted number in each bin, Tj^ = n^xed,k/-^ 1) would then be far in the 
Poisson hmit, and would be distributed as 



P(n;T) = — exp(-T). (4.2) 

77". 



Nevertheless, the joint probability of observing one event in each of nobs,fc sub-bins 
and observing nothing in the remaining A/"— nobs,fe sub-bins is identical to the prob- 
ability of observing nobs,fc the whole bin and in particular is gaussian distributed. 
The reason that the extreme Poisson (and hence non-gaussian) probabilities com- 
bine to form a gaussian joint distribution is that the individual probability functions 
in the sub-bins are identical. This identity arises from the fact that the distribution 
function is effectively constant over the bin. Hence, in the large N limit (and up to 
an irrelevant constant) one can rewrite = — In L-|- const, where L is a product 
over extremely small bins of volume {5zy , 

L = X{ P{<^s,k; n), K^,,k = or 1), (4.3) 

k 

and where 

Tk^{Sz)PF{zk;Xi...Xm). (4.4) 
In the extreme Poisson limit n\ — 1, so that lnP(n; r) = nlnr — r. Hence, 



]nL = J2<^s,kHF{zk){Szr] -Y,F{zk){5zY 

^ ' (4.5) 

= InF(^fe) + N^-^^\n{5zY - A^p^ed- 

obs,fc 



where A^pred ^iid -^obs ^^e respectively the predicted and actual number of obser- 
vations. The second term depends only on the bin size and not the model and so 
is irrelevant. If one restricts F to a class of functions with the same normalization. 
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then the last term is also irrelevant. One then arrives at the standard likelihood 
formulation as a sum of the log probabilities over the observed events, 



In the large N limit, maximizing this likelihood function is equivalent to minimizing 
X^. (It also has the convenient advantage that no binning is required and for that 
reason is often used instead of x^-) The proof given in the previous section therefore 
also apphes to distributions fit by ML in the large N limit. When N is not large 
or if F is pathological, the proof breaks down and some care is required. 

Equation (4.1) is clearly one example of a pathological distribution, since the 
integral (((iln//(iA)^) diverges. What makes the integral pathological is that it 
has a very large (in this case, infinite) contribution from a region of observation 
space that in practice contributes very little information to the estimate. The 
integral was derived by assuming gaussian statistics over this region. Prom the 
above derivation of the ML formula, it is clear that this assumption is valid only 
in regions where there are a large number of observations with identical (or in 
practice, similar) probability distributions. If equation (4.1) were convolved with 
a gaussian of very small width, say A/ 100, then (((iln//(iA)^) would be formally 
convergent. However, equation (1.2) would still underestimate the errors unless N 
were large enough to probe the regions making large contributions to the integral. 
In general, then, equation (1.2) will yield the correct errors if F is first smoothed 
on the scale of the sampling. If F is already smooth on these scales, then equation 
(1.2) will be essentially correct. 

To get a practical sense of these requirements, consider the problem of the 
determining the weak lensing polarization as discussed in § 2.6. Suppose that the 
error in the ellipticity measurement were a = 2%. If the ellipticities of iV = 12 
galaxies were measured then according to Figure 1, the error in the polarization 
would be Sp ~ 1/30. However, there would be only ~ 1 galaxy with a measured 
polarization within 1/30 of the centroid. Hence, it is implausible that the centroid 




(4.6) 



obSjfe 
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could be measured this accurately. Unfortunately, in this case the integral (2.11) 
is power-law divergent near (until it is cut off by the observational errors) so 
the estimate of the variance is sensitive to the choice of a cut off. If one were 
interested in a very accurate estimate of the variance, a monte carlo simulation 
would be required. On the other hand, suppose that N — 300. In this case Figure 
1 predicts Sp ~ 1/150. The distribution function is smooth on scales of ~ 2% 
and there are 0(10) galaxies within 0.02 of the centroid. Thus, the MVB would 
provide a reasonable estimate. 

Acknowledgements: I would like to thank R. Lupton for making several 
helpful suggestions. 
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FIGURE CAPTIONS 



1) Statistical power (inverse square error per galaxy observed) for measurement 
of the polarization due to weak gravitational lensing {solid line). The nu- 
merical evaluation is made by convolving eq. (2.12) with a gaussian error 
with standard deviation a and substituting the result into eq. (2.11). Also 
shown {dashed line) is the statistical power if one uses equal weighting, the 
customary method of finding the centroid of the ellipticities of an observed 
set of galaxies. 
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